y = β₀ + β₁x
Where:
β₀ is the intercept.
β₁ is the slope.
y = β₀ + β₁x + β₂x²
Where:
β₂ is the coefficient of the squared term.
The Curve:
The x² term introduces a curve into the relationship.
If β₂ is positive, the curve opens upward (like a U).
If β₂ is negative, the curve opens downward (like an inverted U).
# Descriptive statistics
Cleaned_4_MMDAs_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 70 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 1376317 | 1156872 | 174370 | 348984.8 | 657000 | 2263250 | 3630000 | ▇▁▃▂▂ |
Cleaned_4_MMDAs_Data %>% skim(IGF)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 70 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| IGF | 0 | 1 | 18488415 | 13590300 | 945774.9 | 3004224 | 19947114 | 24508545 | 55200507 | ▇▇▇▁▁ |
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = IGF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
scale_x_continuous(labels = comma)
# Growth Rate (Percentage)
Cleaned_4_MMDAs_Data <- Cleaned_4_MMDAs_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
)
# Plot of Trends
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Population Growth ",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in IGF Revenue (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "IGF Revenue (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF)) +
geom_point(color = "blue") +
labs( title = "Population vs. IGF Revenue",
x = "population", y = "IGF Revenue (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
The histograms show the uneven distribution of population and IGF revenue. The scatter plot show presence of three clusters in of population and IGF revenue. from the scatter plot as population increases IGF revenue tends to increase.
mod1 <- lm(IGF ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod1)
##
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12729616 -11966444 -1782121 7985036 33634606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11547811.860 3078507.762 3.751 0.000586 ***
## Population 5.043 1.721 2.930 0.005708 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12430000 on 38 degrees of freedom
## Multiple R-squared: 0.1843, Adjusted R-squared: 0.1628
## F-statistic: 8.584 on 1 and 38 DF, p-value: 0.005708
Cleaned_4_MMDAs_Data %>%
ggplot(aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = scales::comma)
# The Quadratic Term
Cleaned_4_MMDAs_Data$Population_Squared <- Cleaned_4_MMDAs_Data$Population^2
# Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
summary(mod_quad)
##
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13667977 -12443856 -1170177 8596836 30208431
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7835768.916689358 4321321.203248335 1.813 0.0779 .
## Population 13.905555380 7.484822769 1.858 0.0712 .
## Population_Squared -0.000002653 0.000002181 -1.216 0.2316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12360000 on 37 degrees of freedom
## Multiple R-squared: 0.2156, Adjusted R-squared: 0.1732
## F-statistic: 5.086 on 2 and 37 DF, p-value: 0.01118
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = comma)
Linear Regression:
Interpretation:
The linear model shows a statistically significant positive relationship between Population and IGF. But the Multiple R-squared = 0.1843 indicates Population explains only 18.43% of the variance in IGF. Adjusted R-squared = 0.1628 is low as well.
Quadratic Regression:
Interpretation: The quadratic model shows a statistically significant relationship between population and IGF revenue in terms of the overall model but the individual terms are not significant. A slight improvement of the R-squared (0.2156).
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
geom_point() + # Added geom_point()
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals(Linear)", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals")
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
shapiro.test(resid(mod1))
##
## Shapiro-Wilk normality test
##
## data: resid(mod1)
## W = 0.89188, p-value = 0.001119
shapiro.test(resid(mod_quad))
##
## Shapiro-Wilk normality test
##
## data: resid(mod_quad)
## W = 0.90865, p-value = 0.003446
# Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
##
## Durbin-Watson test
##
## data: mod1
## DW = 0.34349, p-value = 0.000000000001035
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 0.41706, p-value = 0.00000000001975
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 0.009074, df = 1, p-value = 0.9241
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 7.6576, df = 2, p-value = 0.02174
# Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 0.009074, df = 1, p-value = 0.9241
vif(mod_quad)
## Population Population_Squared
## 19.1496 19.1496
Both the linear and quadratic models violate simple linear regression assumptions.
# Transformed Model
log_log_mod <- lm(log(IGF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
summary(log_log_mod)
##
## Call:
## lm(formula = log(IGF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1580 -1.2204 0.2704 0.9180 1.5090
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.4923 2.5313 4.145 0.000183 ***
## log(Population) 0.4197 0.1844 2.276 0.028594 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.178 on 38 degrees of freedom
## Multiple R-squared: 0.1199, Adjusted R-squared: 0.09677
## F-statistic: 5.178 on 1 and 38 DF, p-value: 0.02859
# Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")
sqrt_model <- lm(sqrt(IGF) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(sqrt_model)
##
## Call:
## lm(formula = sqrt(IGF) ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2236.02 -1673.40 -59.75 1421.85 3075.75
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2909.7332986 402.2778935 7.233 0.0000000119 ***
## Population 0.0007270 0.0002249 3.232 0.00254 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1625 on 38 degrees of freedom
## Multiple R-squared: 0.2157, Adjusted R-squared: 0.195
## F-statistic: 10.45 on 1 and 38 DF, p-value: 0.002539
In both transformations the regression analysis produce statistically signifucant results.
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod1, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.34349, p-value = 0.000000000001035
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.009074, df = 1, p-value = 0.9241
perform_diagnostics(log_log_mod, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.22651, p-value = 0.0000000000000002973
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 14.377, df = 1, p-value = 0.0001496
perform_diagnostics(sqrt_model, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.22835, p-value = 0.0000000000000003264
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 7.1666, df = 1, p-value = 0.007428
cor.test(Cleaned_4_MMDAs_Data$Population, Cleaned_4_MMDAs_Data$IGF)
##
## Pearson's product-moment correlation
##
## data: Cleaned_4_MMDAs_Data$Population and Cleaned_4_MMDAs_Data$IGF
## t = 2.9299, df = 38, p-value = 0.005708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1359439 0.6534081
## sample estimates:
## cor
## 0.4292744
Therefore from the analysis so far we found a strong and statistically significant positive linear relationship between population and IGF revenue. The population size correlated with IGF revenue performance but the relationship is not perfectly strong (Pearson’s product-moment correlation coefficient = 0.4292744 ) . The assumptions are not met even after the transformations, exploring thier relationship through regression might it them.
Cleaned_4_MMDAs_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 73 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 1376317 | 1156872 | 174370 | 348984.8 | 657000 | 2263250 | 3630000 | ▇▁▃▂▂ |
Cleaned_4_MMDAs_Data %>% skim(DACF)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 73 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| DACF | 0 | 1 | 4031443 | 2372091 | 802346.2 | 2202404 | 3356098 | 6050121 | 9497586 | ▇▇▂▅▂ |
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population")
ggplot(Cleaned_4_MMDAs_Data, aes(x = DACF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")
# Plot of Trends
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Population Growth ",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = DACF)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in DACF Revenue (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "DACF Revenue (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = DACF)) +
geom_point(color = "blue") +
labs( title = "Population vs. DACF Revenue",
x = "population", y = "DACF Revenue (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
The histograms show an uneven distribution of population and DACF revenue. Both are right skewed. The scatter plot show a positive relationship between population and DACF revenue.
mod2 <- lm(DACF ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod2)
##
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3261572 -1024436 13929 1022584 4732563
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2230187.9602 457985.5435 4.870 0.00001994 ***
## Population 1.3088 0.2561 5.111 0.00000938 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1850000 on 38 degrees of freedom
## Multiple R-squared: 0.4074, Adjusted R-squared: 0.3918
## F-statistic: 26.12 on 1 and 38 DF, p-value: 0.000009376
Cleaned_4_MMDAs_Data %>%
ggplot(aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = scales::comma)
# Quadratic Regression
mod_quad <- lm(DACF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
summary(mod_quad)
##
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3484317 -1027457 34632 1177382 4253610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1722652.9777718000 644870.5054404821 2.671 0.0112 *
## Population 2.5205136256 1.1169596554 2.257 0.0300 *
## Population_Squared -0.0000003627 0.0000003255 -1.114 0.2723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1844000 on 37 degrees of freedom
## Multiple R-squared: 0.4266, Adjusted R-squared: 0.3957
## F-statistic: 13.77 on 2 and 37 DF, p-value: 0.00003395
From the regression results there is a statistically significant linear relationship between population and DACF revenue performance patterns (p-value: 0.000009376, R-squared: 0.4074, and Adjusted R-squared: 0.3918 ). The Population coefficient is 1.3088 means positive relationship. Population explains 40.74% of the variation in DACF revenue . The quadratic model too is significant.
# Residual
ggplot(data = data.frame(residuals = residuals(mod2),
fitted = fitted(mod2)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted",
x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(sample = residuals)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals ")
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.97786, p-value = 0.6104
# Autocorrelation
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod2)
##
## studentized Breusch-Pagan test
##
## data: mod2
## BP = 3.6303, df = 1, p-value = 0.05674
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.
# Multivariate Normality
#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.97786, p-value = 0.6104
The test of the assumptions of linear regression show all the assumptions are met.
#Transformed Models
Cleaned_4_MMDAs_Data$DACF <- Cleaned_4_MMDAs_Data$DACF
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
summary(log_mod2 )
#
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_4_MMDAs_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.13736 -0.30598 0.09786 0.33927 0.83955
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 9.7185 1.0773 9.021 0.0000000000552 ***
# log(Population) 0.3879 0.0785 4.941 0.0000159394618 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.5015 on 38 degrees of freedom
# Multiple R-squared: 0.3912, Adjusted R-squared: 0.3752
# F-statistic: 24.42 on 1 and 38 DF, p-value: 0.00001594
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_4_MMDAs_Data )
summary(sqrt_mod2)
#
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_4_MMDAs_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -855.03 -282.33 58.54 313.93 910.86
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1128.3519 166.6564 6.771 0.0000000504 ***
# sqrt(Population) 0.7492 0.1421 5.274 0.0000056320 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 452.1 on 38 degrees of freedom
# Multiple R-squared: 0.4226, Adjusted R-squared: 0.4074
# F-statistic: 27.81 on 1 and 38 DF, p-value: 0.000005632
# Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Log(Population) vs. Log(DACF Revenue)",
x = "Log(Population)", y = "Log(DACF Revenue)")
ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 3.6303, df = 1, p-value = 0.05674
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6013, p-value = 0.06482
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.010191, df = 1, p-value = 0.9196
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6579, p-value = 0.1073
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.0396, df = 1, p-value = 0.1533
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.97786, p-value = 0.6104
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.94962, p-value = 0.07356
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.9674, p-value = 0.2968
Both the log-log and square root transformations are statistically significant. And slightly improve relationship
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6924, p-value = 0.1292
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 3.6303, df = 1, p-value = 0.05674
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6013, p-value = 0.06482
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.010191, df = 1, p-value = 0.9196
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.6579, p-value = 0.1073
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.0396, df = 1, p-value = 0.1533
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.97786, p-value = 0.6104
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.94962, p-value = 0.07356
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.9674, p-value = 0.2968
None of the assumptions are violated.
From the regression analysis so all the models are statistically significant and all all assumptions met. The linear model, log-log model, and square root model are all statistically significant. In the linear model for every 1 unit increase in population, DACF increases by 1.3088. In the log model for every 1% increase in population, DACF increases by .3879%. And also in the square root model a one-unit increase in the square root of Population is associated with a 0.7492-unit increase in the square root of DACF.
Given these models it can be concluded that changes in the population can predict changes in the DACF revenue performance and any observed pattern could not likely be due to chance.
Cleaned_4_MMDAs_Data %>% skim(Capital_Expenditure)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 73 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Expenditure | 0 | 1 | 10950443 | 9929727 | 895337.7 | 4780086 | 7434365 | 11857008 | 46223724 | ▇▃▂▁▁ |
Cleaned_4_MMDAs_Data %>% skim(Recrrent_Expenditure)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 73 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Recrrent_Expenditure | 9 | 0.78 | 11583810 | 7734814 | 864055.3 | 3741295 | 13550852 | 17984476 | 24388461 | ▇▁▃▅▃ |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Capital_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
recu_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Recrrent_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Recurrent Expenditure ", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Population Histogram
pop_hist <- ggplot(Cleaned_4_MMDAs_Data, aes(x = Population)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Population", x = "Population", y = "Density") +
scale_x_continuous(labels = comma)
cap_hist
recu_hist
pop_hist
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Population Trend",
x = "Year (2012-2022)",
y = "Population"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Capital Expenditure Trend",
x = "Year (2012-2022)",
y = "Capital Expenditure"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Recrrent_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Recurrent Expenditure Trend",
x = "Year (2012-2022)",
y = "Recurrent Expenditure"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point(color = "blue") +
labs( title = "Population vs. Capital Expenditure",
x = "population", y = "Capital Expenditure (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Recrrent_Expenditure)) +
geom_point(color = "blue") +
labs( title = "Population vs. Recurrent Expenditure",
x = "population", y = "Recurrent Expenditure (Ghana Cedis)") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
# Calculate Per Capita Values
Cleaned_4_MMDAs_Data$Capital_Exp_Per_Capita <- Cleaned_4_MMDAs_Data$Capital_Expenditure / Cleaned_4_MMDAs_Data$Population
# Per Capita Analysis
average_capita <- mean(Cleaned_4_MMDAs_Data$Capital_Exp_Per_Capita)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita"), color = "blue") +
labs(title = "Capital Expenditure Per Capita Over Time", x = "Year (2012 - 2022) ", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod3)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11328484 -4446239 -2234438 2144594 31316925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6365461.982 2363138.031 2.694 0.0116 *
## Population 3.097 1.316 2.353 0.0256 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9019000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.1603, Adjusted R-squared: 0.1314
## F-statistic: 5.538 on 1 and 29 DF, p-value: 0.02561
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8040592 -5802868 -1976070 5056903 15867100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7440523.2478 1769853.4264 4.204 0.000229 ***
## Population 3.1691 0.9856 3.215 0.003192 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6755000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2628, Adjusted R-squared: 0.2374
## F-statistic: 10.34 on 1 and 29 DF, p-value: 0.003192
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod_cap)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12766743 -5251934 -2102442 3645339 30310330
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6006767.436 2261882.225 2.656 0.0115 *
## Population 3.592 1.265 2.840 0.0072 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9136000 on 38 degrees of freedom
## Multiple R-squared: 0.1751, Adjusted R-squared: 0.1534
## F-statistic: 8.068 on 1 and 38 DF, p-value: 0.007202
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(mod_rec)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8040592 -5802868 -1976070 5056903 15867100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7440523.2478 1769853.4264 4.204 0.000229 ***
## Population 3.1691 0.9856 3.215 0.003192 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6755000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2628, Adjusted R-squared: 0.2374
## F-statistic: 10.34 on 1 and 29 DF, p-value: 0.003192
Cleaned_4_MMDAs_Data %>%
ggplot(aes(x = Population, y = Capital_Expenditure)) +
geom_point()+
geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
scale_y_continuous(labels = scales::comma)
Cleaned_4_MMDAs_Data %>%
ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
scale_y_continuous(labels = scales::comma)
From the linear regression results there is a significant positive linear relationship between Population and Capital and Recurrent Expenditure.
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.86878, p-value = 0.00002011
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 8.1362, df = 1, p-value = 0.004339
perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.52471, p-value = 0.0000000961
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 6.3745, df = 1, p-value = 0.01158
# Recurrent Expenditure
From the linear models violate the most of the assumptions of linear regression
Cleaned_4_MMDAs_Data$Recrrent_Expenditure_squared <- Cleaned_4_MMDAs_Data$Recrrent_Expenditure^2
Cleaned_4_MMDAs_Data$Capital_Expenditure_squared <- Cleaned_4_MMDAs_Data$Capital_Expenditure^2
mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_4_MMDAs_Data)
# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared,
## data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11103603 -5642278 -2484793 2604630 29136527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2092421.801843737 3696487.467239643 0.566 0.5759
## Population 14.211410888 7.604401699 1.869 0.0721 .
## Population_Squared -0.000003182 0.000002145 -1.483 0.1492
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8838000 on 28 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.2215, Adjusted R-squared: 0.1659
## F-statistic: 3.983 on 2 and 28 DF, p-value: 0.03004
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared,
## data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7887901 -4481014 -1944637 4240246 15154962
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11010190.152475385 2741774.699953048 4.016 0.000403 ***
## Population -6.115856635 5.640369776 -1.084 0.287480
## Population_Squared 0.000002658 0.000001591 1.670 0.105992
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6555000 on 28 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.3296, Adjusted R-squared: 0.2817
## F-statistic: 6.883 on 2 and 28 DF, p-value: 0.003704
# Scatter Plots (Transformed Data)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
scale_y_continuous(labels = comma)
Quadratic model show an improvement of the relationship between population and capital expenditure. The overall p-value is still significant.
# Log Transformation for Recurrent Expenditure
Cleaned_4_MMDAs_Data$Capital_Expenditure_adjusted <- Cleaned_4_MMDAs_Data$Capital_Expenditure + 1
log_cap_mod <- lm(log(Capital_Expenditure_adjusted) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(log_cap_mod)
##
## Call:
## lm(formula = log(Capital_Expenditure_adjusted) ~ Population,
## data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.84433 -0.41447 -0.02797 0.66523 1.35168
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.4252198367 0.1993591208 77.374 < 0.0000000000000002 ***
## Population 0.0000003162 0.0000001115 2.837 0.00727 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8053 on 38 degrees of freedom
## Multiple R-squared: 0.1748, Adjusted R-squared: 0.1531
## F-statistic: 8.048 on 1 and 38 DF, p-value: 0.007265
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.52255, p-value = 0.000000002959
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.21996, df = 1, p-value = 0.6391
Cleaned_4_MMDAs_Data$Ln_Population <- log(Cleaned_4_MMDAs_Data$Population)
Cleaned_4_MMDAs_Data$Ln_Capital_Expenditure <- log(Cleaned_4_MMDAs_Data$Capital_Expenditure)
ggplot(Cleaned_4_MMDAs_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Log(Population) vs. Log(Capital Expenditure)",
x = "Log(Population)", y = "Log(Capital Expenditure)")
# Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
summary(sqrt_cap_mod)
##
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2170.2 -824.0 -256.1 755.6 3076.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2364.5993423 298.4236837 7.924 0.00000000144 ***
## Population 0.0004922 0.0001668 2.950 0.00541 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1205 on 38 degrees of freedom
## Multiple R-squared: 0.1863, Adjusted R-squared: 0.1649
## F-statistic: 8.703 on 1 and 38 DF, p-value: 0.005414
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 0.72057, p-value = 0.000000898
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 7.145, df = 1, p-value = 0.007517
From the regression analysis above the relationship between population , capital , and recurrent expenditure is positive linear and significant.
Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).
# Descriptive statistics
Cleaned_4_MMDAs_Data %>% skim(Capital_Exp_Per_Capita)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 78 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Exp_Per_Capita | 0 | 1 | 14.11 | 14.5 | 0.73 | 4.36 | 8.97 | 18.44 | 58.96 | ▇▃▂▁▁ |
Cleaned_4_MMDAs_Data %>% skim(TtRev_Growth_Rate)
| Name | Piped data |
| Number of rows | 40 |
| Number of columns | 78 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TtRev_Growth_Rate | 3 | 0.92 | 2.85 | 25.27 | -81.19 | -11.06 | 7.9 | 18.14 | 40.94 | ▁▁▃▇▆ |
# Histograms
ggplot(Cleaned_4_MMDAs_Data, aes(x = Capital_Exp_Per_Capita)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = TtRev_Growth_Rate)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate")
The histograms show an uneven distribution of Capital expenditure per capita.And Total revenue growth rate.
mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_4_MMDAs_Data)
summary(mod5)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.180 -10.176 -4.637 6.675 44.986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.57674 2.46733 5.908 0.00000102 ***
## TtRev_Growth_Rate 0.08449 0.09835 0.859 0.396
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.91 on 35 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.02065, Adjusted R-squared: -0.007331
## F-statistic: 0.738 on 1 and 35 DF, p-value: 0.3961
ggplot(Cleaned_4_MMDAs_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
x = "Total Revenue Growth Rate (%)",
y = "Capital Expenditure Per Capita")
The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.3961) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.02065) indicates only 2.07% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)
Cleaned_4_MMDAs_Data$Expenditure_Growth <- c(NA, diff(Cleaned_4_MMDAs_Data$Total_Expenditure) / Cleaned_4_MMDAs_Data$Total_Expenditure[-nrow(Cleaned_4_MMDAs_Data)]) * 100
ggplot(Cleaned_4_MMDAs_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
geom_point() + geom_smooth(method = "lm", se = TRUE)+
labs(title = "Relationship Expenditure Growth vs. Capital Expenditure (Per Capita)",
x = "Expenditure Growth Rate (%)",
y = "Capital Expenditure Per Capita")
There is no statistically significant linear relationship.
# no variables
# Trends of Revenue and Expenditure over the years.
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Revenue)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(title = "Total Revenue Trend",
x = "Year (2012 - 2012)",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Revenue)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Total Revenue Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Total Expenditure Growth ",
x = "Year (2012-2022)",
y = "Amount (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
labs(title = "Total Expenditure Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
labs(title = "Revenue Vs. Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)", color = "Type") +
scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = Total_Expenditure)) +
geom_point(color = "blue") +
labs( title = "Total Revenue Vs. Total Expenditure (Ghana Cedis)",
x = "Total Revenue", y = "Total Expenditure ") +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma) +
scale_x_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure"), linewidth = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
labs(
title = "Revenue Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "#0000FF", # Blue
"Other Sources" = "#87CEEB", # Light Blue
"IGF" = "#00CD66", # Green
"DACF" = "#808080", # Gray
"Capital Expenditure" = "#9370DB", # Purple
"Total Expenditure" = "#FF0000", # Red
"Recurrent Expenditure" = "#FFD700" # Yellow
)
) +
scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF_TE)) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1))
cor.test(Cleaned_4_MMDAs_Data$Total_Expenditure, Cleaned_4_MMDAs_Data$Total_Revenue)
##
## Pearson's product-moment correlation
##
## data: Cleaned_4_MMDAs_Data$Total_Expenditure and Cleaned_4_MMDAs_Data$Total_Revenue
## t = 39.799, df = 38, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9776727 0.9937968
## sample estimates:
## cor
## 0.9882165
# Revenue Per Capita
Cleaned_4_MMDAs_Data$Total_Revenue_Per_Capita <- Cleaned_4_MMDAs_Data$Total_Revenue / Cleaned_4_MMDAs_Data$Population
Cleaned_4_MMDAs_Data$IGF_Per_Capita <- Cleaned_4_MMDAs_Data$IGF / Cleaned_4_MMDAs_Data$Population
Cleaned_4_MMDAs_Data$DACF_Per_Capita <- Cleaned_4_MMDAs_Data$DACF / Cleaned_4_MMDAs_Data$Population
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = IGF, color = "IGF"), linewidth = 1) +
geom_point(aes(y = DACF, color = "DACF"), linewidth = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources"), linewidth = 1) +
labs(
title = "Revenue Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "#0000FF", # Blue
"Other Sources" = "#87CEEB", # Light Blue
"IGF" = "#00CD66", # Green
"DACF" = "#808080", # Gray
"Capital Expenditure" = "#9370DB", # Purple
"Total Expenditure" = "#FF0000", # Red
"Recurrent Expenditure" = "#FFD700" # Yellow
)
) +
scale_y_continuous(labels = comma, breaks = seq(0, 60000000, 10000000)) + # Added breaks
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Population Trend
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Total_Expenditure)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Total Expenditure Trend",
x = "Year",
y = "Amount (Ghana Cedis)") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Population)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Population Trend",
x = "Year",
y = "Population")
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = IGF)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "IGF Trend",
x = "Year",
y = "IGF") +
scale_y_continuous(labels = comma)
# Per capita plot
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
scale_y_continuous(labels = comma)
cor_matrix <- cor(Cleaned_4_MMDAs_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "IGF")], use = "complete.obs")
print(cor_matrix)
## Population Total_Revenue Total_Expenditure IGF_TE
## Population 1.0000000 0.5299488 0.5520108 0.1521863
## Total_Revenue 0.5299488 1.0000000 0.9882165 0.4711984
## Total_Expenditure 0.5520108 0.9882165 1.0000000 0.4196154
## IGF_TE 0.1521863 0.4711984 0.4196154 1.0000000
## IGF 0.4292744 0.9361730 0.9130142 0.6904556
## IGF
## Population 0.4292744
## Total_Revenue 0.9361730
## Total_Expenditure 0.9130142
## IGF_TE 0.6904556
## IGF 1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")
In the above there is a moderate positive correlation between total revenue and total expenditure and also between IGF.
# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_4_MMDAs_Data)
summary(model_revenue_pop)
##
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26468384 -15556098 -1699448 12888670 56700610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26369062.420 4978752.876 5.296 0.00000524 ***
## Population 10.723 2.784 3.852 0.000437 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20110000 on 38 degrees of freedom
## Multiple R-squared: 0.2808, Adjusted R-squared: 0.2619
## F-statistic: 14.84 on 1 and 38 DF, p-value: 0.0004366
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Total_Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# # Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
summary(model_expenditure_pop)
##
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26429954 -15426864 -2272971 12574424 49267490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25060267.648 4949117.374 5.064 0.0000109 ***
## Population 11.292 2.767 4.081 0.000222 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19990000 on 38 degrees of freedom
## Multiple R-squared: 0.3047, Adjusted R-squared: 0.2864
## F-statistic: 16.65 on 1 and 38 DF, p-value: 0.0002219
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = Total_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_4_MMDAs_Data)
summary(model_capital_rev_igf)
##
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12647433 -3540160 -724350 2904795 25463778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2528523.53326 2878542.65189 0.878 0.3854
## Total_Revenue 0.31826 0.05862 5.430 0.00000372 ***
## IGF_TE -11508662.07997 6569938.76021 -1.752 0.0881 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7558000 on 37 degrees of freedom
## Multiple R-squared: 0.4504, Adjusted R-squared: 0.4206
## F-statistic: 15.16 on 2 and 37 DF, p-value: 0.00001554
ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Population, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_4_MMDAs_Data)
summary(model_igfte_pop_rev)
##
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26068 -0.13832 -0.04149 0.16199 0.55307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.239981076202 0.061202888928 3.921 0.000368 ***
## Population -0.000000024482 0.000000030605 -0.800 0.428864
## Total_Revenue 0.000000004845 0.000000001513 3.203 0.002793 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1875 on 37 degrees of freedom
## Multiple R-squared: 0.2353, Adjusted R-squared: 0.1939
## F-statistic: 5.691 on 2 and 37 DF, p-value: 0.007
ggplot(Cleaned_4_MMDAs_Data, aes(x = Total_Revenue, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
In the regression results above, we found a significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue. But there is non-significance between IGF_TE vs Population and Total Revenue.
# no variables
ggplot(Cleaned_4_MMDAs_Data, aes(x = factor(Year), y = IGF)) +
geom_point(color = "dodgerblue")+
labs(title = "IGF Trend",
x = "Year",
y = "IGF (Ghana Cedis) ") +
scale_y_continuous(labels = comma)
# Land-Based Revenue Trends
ggplot(Cleaned_4_MMDAs_Data, aes(x = (Year))) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# IGF vs Land-Based Revenues
model_igf_land <- lm(IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
summary(model_igf_land)
##
## Call:
## lm(formula = IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2199688 -571651 -183026 552228 3629412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 310732.5739 422989.8807 0.735 0.470701
## Act_Permit 1.1180 0.2450 4.563 0.000169 ***
## Act_Property_Rates 1.1112 0.3060 3.632 0.001562 **
## Act_Stool_Lands 1.3252 1.5361 0.863 0.398052
## Act_Licenses 0.9988 0.2762 3.616 0.001621 **
## Act_Fees 1.1632 0.2099 5.543 0.0000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1381000 on 21 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.9942, Adjusted R-squared: 0.9928
## F-statistic: 714.5 on 5 and 21 DF, p-value: < 0.00000000000000022
cor_matrix_land_igf <- cor(Cleaned_4_MMDAs_Data[, c("IGF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_igf)
## IGF Act_Permit Act_Property_Rates Act_Stool_Lands
## IGF 1.0000000 0.8215051 0.94359813 0.30454925
## Act_Permit 0.8215051 1.0000000 0.92357198 -0.15615766
## Act_Property_Rates 0.9435981 0.9235720 1.00000000 0.04037212
## Act_Stool_Lands 0.3045493 -0.1561577 0.04037212 1.00000000
## Act_Licenses 0.9170742 0.5578915 0.78568005 0.50605306
## Act_Fees 0.8125482 0.3809034 0.59577159 0.68482529
## Act_Licenses Act_Fees
## IGF 0.9170742 0.8125482
## Act_Permit 0.5578915 0.3809034
## Act_Property_Rates 0.7856800 0.5957716
## Act_Stool_Lands 0.5060531 0.6848253
## Act_Licenses 1.0000000 0.9013082
## Act_Fees 0.9013082 1.0000000
corrplot(cor_matrix_land_igf)
From the multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, Act fees, licenses) and revenue (IGF) the overall model is statistically significant with a high R-squared of 0.9942, means 99.24% of the variation in the IGF is explained by the land-based revenues (permit fees, property rates, rents, stool lands revenue, fees, licenses). All individual terms in the model that are also significant except stool lands.
The correlation matrix shows that IGF is strongly correlated with Act property Rates, Permit, fees, and licenses.
# Simple linear Regression Analysis
model_permit <- lm(IGF ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
model_property <- lm(IGF ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
model_stool <- lm(IGF ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
model_license <- lm(IGF ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
model_acts <- lm(IGF ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
# Visualizations
# Scatter plots (IGF vs each land-based revenue)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Permit, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Permit Fees", x = "Permit Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = IGF ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11956474 -8269390 -657231 8675417 16617981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9896406.0655 2048849.7169 4.830 0.0000526060 ***
## Act_Permit 3.2062 0.4365 7.345 0.0000000844 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9260000 on 26 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.6748, Adjusted R-squared: 0.6623
## F-statistic: 53.95 on 1 and 26 DF, p-value: 0.00000008437
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Property_Rates, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Property Rates", x = "Property Rates", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11986279 -5080689 -511914 6584145 12067284
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5738307.543 1649515.907 3.479 0.00128 **
## Act_Property_Rates 2.441 0.234 10.431 0.00000000000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7005000 on 38 degrees of freedom
## Multiple R-squared: 0.7411, Adjusted R-squared: 0.7343
## F-statistic: 108.8 on 1 and 38 DF, p-value: 0.000000000001044
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Stool_Lands, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = IGF ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15408777 -14095459 -686140 5289834 38947581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16200718.954 2999886.709 5.400 0.00000519 ***
## Act_Stool_Lands 3.734 2.643 1.412 0.167
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14060000 on 34 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.05542, Adjusted R-squared: 0.02764
## F-statistic: 1.995 on 1 and 34 DF, p-value: 0.1669
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Licenses, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Licenses", x = "Licenses", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = IGF ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9404685 -1971707 -496817 245174 20767083
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -180099.7894 1739000.6330 -0.104 0.918
## Act_Licenses 3.7175 0.2907 12.789 0.00000000000000242 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5978000 on 38 degrees of freedom
## Multiple R-squared: 0.8115, Adjusted R-squared: 0.8065
## F-statistic: 163.6 on 1 and 38 DF, p-value: 0.000000000000002421
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Fees, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Act Fees", x = "Act Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = IGF ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14113758 -4632751 -3669578 3895397 23834709
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4614652.189 2329681.911 1.981 0.0549 .
## Act_Fees 3.229 0.435 7.423 0.00000000663 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8796000 on 38 degrees of freedom
## Multiple R-squared: 0.5918, Adjusted R-squared: 0.5811
## F-statistic: 55.1 on 1 and 38 DF, p-value: 0.000000006631
The simple linear regression analysis of the land-based revenues found all the simple models to be statistically significant except stool land with p-value = 0.1669
ggplot(Cleaned_4_MMDAs_Data, aes(x = factor(Year), y = DACF)) +
geom_point(color = "dodgerblue")+
labs(title = "DACF Trend",
x = "Year",
y = "DACF (Ghana Cedis) ") +
scale_y_continuous(labels = comma)
# Land-Based Revenue Trends
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year)) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Over Years",
x = "Year (2012 - 2022)",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# DACF vs Land-Based Revenues
model_DACF_land <- lm(DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
summary(model_DACF_land)
##
## Call:
## lm(formula = DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2315757 -947786 141680 881880 5022482
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2988115.8232 500024.9377 5.976 0.00000624 ***
## Act_Permit -0.2604 0.2897 -0.899 0.3789
## Act_Property_Rates 0.7374 0.3617 2.039 0.0543 .
## Act_Stool_Lands 5.1036 1.8159 2.811 0.0105 *
## Act_Licenses -0.8760 0.3265 -2.683 0.0139 *
## Act_Fees 0.5020 0.2481 2.024 0.0559 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1632000 on 21 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.6447, Adjusted R-squared: 0.5601
## F-statistic: 7.62 on 5 and 21 DF, p-value: 0.0003219
cor_matrix_land_DACF <- cor(Cleaned_4_MMDAs_Data[, c("DACF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_DACF)
## DACF Act_Permit Act_Property_Rates Act_Stool_Lands
## DACF 1.0000000 0.2863056 0.40589255 0.60691291
## Act_Permit 0.2863056 1.0000000 0.92357198 -0.15615766
## Act_Property_Rates 0.4058925 0.9235720 1.00000000 0.04037212
## Act_Stool_Lands 0.6069129 -0.1561577 0.04037212 1.00000000
## Act_Licenses 0.5054713 0.5578915 0.78568005 0.50605306
## Act_Fees 0.6458537 0.3809034 0.59577159 0.68482529
## Act_Licenses Act_Fees
## DACF 0.5054713 0.6458537
## Act_Permit 0.5578915 0.3809034
## Act_Property_Rates 0.7856800 0.5957716
## Act_Stool_Lands 0.5060531 0.6848253
## Act_Licenses 1.0000000 0.9013082
## Act_Fees 0.9013082 1.0000000
corrplot(cor_matrix_land_DACF)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (DACF) is statistically significant ( p-value: 0.0003219) with a R-squared of 0.6447, and Adjusted R-squared of 0.5601 means a good model and does fit. In terms of individual terms stool lands and licences are significant.
The correlation matrix shows that DACF is very weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(DACF ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
model_property <- lm(DACF ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
model_stool <- lm(DACF ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
model_license <- lm(DACF ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
model_acts <- lm(DACF ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
# Scatter plots (DACF vs each land-based revenue)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Permit, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Permit Fees", x = "Permit Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = DACF ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3868357 -1539649 -218486 2394046 4022825
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4418381.9676 522047.7647 8.464 0.00000000605 ***
## Act_Permit 0.1685 0.1112 1.515 0.142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2359000 on 26 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.08108, Adjusted R-squared: 0.04574
## F-statistic: 2.294 on 1 and 26 DF, p-value: 0.1419
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Property_Rates, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Property Rates", x = "Property Rates", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3009220 -1922031 -472524 2025856 5183305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3801414.98663 563638.93422 6.744 0.0000000546 ***
## Act_Property_Rates 0.04404 0.07996 0.551 0.585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2394000 on 38 degrees of freedom
## Multiple R-squared: 0.007918, Adjusted R-squared: -0.01819
## F-statistic: 0.3033 on 1 and 38 DF, p-value: 0.585
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Stool_Lands, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = DACF ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3848718 -1353869 -694756 2029283 4927942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4683693.1644 509518.1393 9.192 0.0000000000963 ***
## Act_Stool_Lands -0.7919 0.4490 -1.764 0.0867 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2389000 on 34 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.08384, Adjusted R-squared: 0.05689
## F-statistic: 3.111 on 1 and 34 DF, p-value: 0.08674
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Licenses, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Licenses", x = "Licenses", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = DACF ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3408240 -1976839 -234459 1372516 5673693
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2802913.2124 657512.5669 4.263 0.000128 ***
## Act_Licenses 0.2446 0.1099 2.226 0.032024 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2260000 on 38 degrees of freedom
## Multiple R-squared: 0.1154, Adjusted R-squared: 0.09207
## F-statistic: 4.955 on 1 and 38 DF, p-value: 0.03202
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Fees, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Act Fees", x = "Act Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = DACF ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2953467 -1454009 -88285 1254003 5089487
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2289707.66124 530162.41568 4.319 0.000108 ***
## Act_Fees 0.40535 0.09899 4.095 0.000213 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2002000 on 38 degrees of freedom
## Multiple R-squared: 0.3062, Adjusted R-squared: 0.2879
## F-statistic: 16.77 on 1 and 38 DF, p-value: 0.0002128
From the simple linear regression analysis of the land-based revenues there is a statistically significant linear relationship between DACF and licenses and Fees, the rest are not significant.
# Capital_Expenditure Trend
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Capital Expenditure (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "Capital Expenditure (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
# Capital_Expenditure vs Land-Based Revenues
model_Capital_Expenditure_land <- lm(Capital_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
summary(model_Capital_Expenditure_land)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15939545 -2593558 -593094 2076780 12876415
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4046511.792 2226379.607 1.818 0.083434 .
## Act_Permit -5.067 1.290 -3.929 0.000770 ***
## Act_Property_Rates 7.618 1.610 4.731 0.000113 ***
## Act_Stool_Lands 10.974 8.085 1.357 0.189106
## Act_Licenses -4.961 1.454 -3.412 0.002623 **
## Act_Fees 2.184 1.105 1.977 0.061343 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7267000 on 21 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.6844, Adjusted R-squared: 0.6093
## F-statistic: 9.11 on 5 and 21 DF, p-value: 0.0001001
cor_matrix_land_Capital_Expenditure <- cor(Cleaned_4_MMDAs_Data[, c("Capital_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Capital_Expenditure)
## Capital_Expenditure Act_Permit Act_Property_Rates
## Capital_Expenditure 1.0000000 0.3497534 0.55614554
## Act_Permit 0.3497534 1.0000000 0.92357198
## Act_Property_Rates 0.5561455 0.9235720 1.00000000
## Act_Stool_Lands 0.3487786 -0.1561577 0.04037212
## Act_Licenses 0.5508003 0.5578915 0.78568005
## Act_Fees 0.5710724 0.3809034 0.59577159
## Act_Stool_Lands Act_Licenses Act_Fees
## Capital_Expenditure 0.34877859 0.5508003 0.5710724
## Act_Permit -0.15615766 0.5578915 0.3809034
## Act_Property_Rates 0.04037212 0.7856800 0.5957716
## Act_Stool_Lands 1.00000000 0.5060531 0.6848253
## Act_Licenses 0.50605306 1.0000000 0.9013082
## Act_Fees 0.68482529 0.9013082 1.0000000
corrplot(cor_matrix_land_Capital_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is statistically significant with p-value ( 0.0001001 ), R-squared of 0.6844 and Adjusted R-squared of 0.6093 . The individual terms that are significant are licenses, property rate, and permit fees, the rest are not.
The correlation matrix shows that Capital_Expenditure shows is moderately correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Capital_Expenditure ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
model_property <- lm(Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
model_stool <- lm(Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
model_license <- lm(Capital_Expenditure ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
model_acts <- lm(Capital_Expenditure ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Permit, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Permit Fees", x = "Permit Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9889339 -6419099 -4223606 2806572 35875465
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9451494.5758 2449551.5435 3.858 0.000676 ***
## Act_Permit 1.0043 0.5219 1.924 0.065304 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11070000 on 26 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.1247, Adjusted R-squared: 0.09102
## F-statistic: 3.704 on 1 and 26 DF, p-value: 0.0653
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Property_Rates, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Property Rates", x = "Property Rates", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12463725 -4365388 -2990184 2284408 34289296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6947344.3824 2200776.9272 3.157 0.00312 **
## Act_Property_Rates 0.7664 0.3122 2.455 0.01880 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9346000 on 38 degrees of freedom
## Multiple R-squared: 0.1369, Adjusted R-squared: 0.1141
## F-statistic: 6.025 on 1 and 38 DF, p-value: 0.0188
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Stool_Lands, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10487681 -6302318 -3607758 1676627 34837295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11580336.3488 2230751.4732 5.191 0.0000097 ***
## Act_Stool_Lands -0.3379 1.9656 -0.172 0.865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10460000 on 34 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.0008684, Adjusted R-squared: -0.02852
## F-statistic: 0.02955 on 1 and 34 DF, p-value: 0.8645
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Licenses, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Licenses", x = "Licenses", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10264968 -4434547 -1643392 2136094 30756350
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3537297.1434 2551648.4018 1.386 0.17374
## Act_Licenses 1.4762 0.4265 3.461 0.00134 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8772000 on 38 degrees of freedom
## Multiple R-squared: 0.2397, Adjusted R-squared: 0.2197
## F-statistic: 11.98 on 1 and 38 DF, p-value: 0.001345
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Fees, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Act Fees", x = "Act Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12366404 -5476902 -724847 2431699 26682803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3864664.3598 2246310.9215 1.720 0.093486 .
## Act_Fees 1.6490 0.4194 3.932 0.000345 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8481000 on 38 degrees of freedom
## Multiple R-squared: 0.2892, Adjusted R-squared: 0.2705
## F-statistic: 15.46 on 1 and 38 DF, p-value: 0.0003454
The simple linear regression analysis of the land-based revenues found capital expenditure
# Recrrent_Expenditure Trend
ggplot(Cleaned_4_MMDAs_Data, aes(x = Year, y = Recrrent_Expenditure)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
labs(
title = "Trends in Recurrent Expenditure (Ghana Cedis) Growth ",
x = "Year (2012-2022)",
y = "Recurrent Expenditure (Ghana Cedis)"
) +
theme(plot.title = element_text(hjust = 0.5))+
scale_y_continuous(labels = comma)
# Recrrent_Expenditure vs Land-Based Revenues
model_Recrrent_Expenditure_land <- lm(Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
summary(model_Recrrent_Expenditure_land)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2906796 -222728 89744 643896 1306131
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2030458.4562 426601.6146 4.760 0.000373 ***
## Act_Permit -4.3745 1.3069 -3.347 0.005249 **
## Act_Property_Rates -1.0340 0.4035 -2.563 0.023624 *
## Act_Stool_Lands 5.3047 2.0618 2.573 0.023169 *
## Act_Licenses 1.8923 0.4307 4.394 0.000726 ***
## Act_Fees 0.6084 0.3617 1.682 0.116473
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1169000 on 13 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.9839, Adjusted R-squared: 0.9777
## F-statistic: 158.6 on 5 and 13 DF, p-value: 0.00000000003542
cor_matrix_land_Recrrent_Expenditure <- cor(Cleaned_4_MMDAs_Data[, c("Recrrent_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Recrrent_Expenditure)
## Recrrent_Expenditure Act_Permit Act_Property_Rates
## Recrrent_Expenditure 1.0000000 0.7877846 0.8585696
## Act_Permit 0.7877846 1.0000000 0.7344373
## Act_Property_Rates 0.8585696 0.7344373 1.0000000
## Act_Stool_Lands 0.9218210 0.7911539 0.7584052
## Act_Licenses 0.9693341 0.8582537 0.9200488
## Act_Fees 0.9729835 0.8285998 0.9096113
## Act_Stool_Lands Act_Licenses Act_Fees
## Recrrent_Expenditure 0.9218210 0.9693341 0.9729835
## Act_Permit 0.7911539 0.8582537 0.8285998
## Act_Property_Rates 0.7584052 0.9200488 0.9096113
## Act_Stool_Lands 1.0000000 0.8766962 0.9042870
## Act_Licenses 0.8766962 1.0000000 0.9772662
## Act_Fees 0.9042870 0.9772662 1.0000000
corrplot(cor_matrix_land_Recrrent_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Recrrent_Expenditure) is statistically significant with p-value ( 0.00000000003 ), R-squared of 0.9839 and Adjusted R-squared of 0.977 . The individual terms are all significant
The correlation matrix shows that Recrrent_Expenditure shows is strongly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Recrrent_Expenditure ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
model_property <- lm(Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
model_stool <- lm(Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
model_license <- lm(Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
model_acts <- lm(Recrrent_Expenditure ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
# Scatter plots (Recrrent_Expenditure vs each land-based revenue)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Permit, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recrrent_Expenditure vs Permit Fees", x = "Permit Fees", y = "Recrrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6780793 -2515734 -1271037 176050 11314710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2601870.171 1656190.036 1.571 0.135
## Act_Permit 14.244 2.701 5.273 0.0000621 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4958000 on 17 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.6206, Adjusted R-squared: 0.5983
## F-statistic: 27.81 on 1 and 17 DF, p-value: 0.0000621
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Property_Rates, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recrrent_Expenditure vs Property Rates", x = "Property Rates", y = "Recrrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6356133 -4883119 -2122550 3913238 13476697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6545724.1032 1531616.2724 4.274 0.000189 ***
## Act_Property_Rates 1.2078 0.2619 4.611 0.0000747 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5976000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.423, Adjusted R-squared: 0.4031
## F-statistic: 21.26 on 1 and 29 DF, p-value: 0.00007471
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Stool_Lands, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recrrent_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Recrrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9539989 -4493707 -2435281 3988509 11152424
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6359801.406 1534294.852 4.145 0.000320 ***
## Act_Stool_Lands 5.292 1.197 4.421 0.000155 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5904000 on 26 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.4292, Adjusted R-squared: 0.4072
## F-statistic: 19.55 on 1 and 26 DF, p-value: 0.0001548
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Licenses, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recrrent_Expenditure vs Licenses", x = "Licenses", y = "Recrrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5708216 -1617768 -256195 1197062 10423314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1462241.5005 1054317.9810 1.387 0.176
## Act_Licenses 2.2745 0.1957 11.621 0.00000000000197 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3308000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.8232, Adjusted R-squared: 0.8171
## F-statistic: 135 on 1 and 29 DF, p-value: 0.000000000001969
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Fees, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recrrent_Expenditure vs Act Fees", x = "Act Fees", y = "Recrrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8219613 -3158447 -1642603 2002093 9884854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4708701.797 1337927.233 3.519 0.00145 **
## Act_Fees 1.762 0.259 6.804 0.00000018 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4882000 on 29 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.6148, Adjusted R-squared: 0.6015
## F-statistic: 46.29 on 1 and 29 DF, p-value: 0.0000001804
The simple linear regression analysis of the land-based revenues and recurrent expenditure all significant
# Population vs Land-Based Revenues
model_Population_land <- lm(Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
summary(model_Population_land)
##
## Call:
## lm(formula = Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1129645 -134188 -29655 169474 835837
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 498627.16690 127842.97861 3.900 0.000824 ***
## Act_Permit 0.17333 0.07406 2.340 0.029217 *
## Act_Property_Rates -0.26905 0.09247 -2.910 0.008382 **
## Act_Stool_Lands 0.49595 0.46427 1.068 0.297545
## Act_Licenses 0.19704 0.08348 2.360 0.028014 *
## Act_Fees 0.18674 0.06343 2.944 0.007748 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 417300 on 21 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.8938, Adjusted R-squared: 0.8685
## F-statistic: 35.35 on 5 and 21 DF, p-value: 0.000000001547
cor_matrix_land_Population <- cor(Cleaned_4_MMDAs_Data[, c("Population", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Population)
## Population Act_Permit Act_Property_Rates Act_Stool_Lands
## Population 1.0000000 0.1929115 0.37402979 0.75993866
## Act_Permit 0.1929115 1.0000000 0.92357198 -0.15615766
## Act_Property_Rates 0.3740298 0.9235720 1.00000000 0.04037212
## Act_Stool_Lands 0.7599387 -0.1561577 0.04037212 1.00000000
## Act_Licenses 0.7803958 0.5578915 0.78568005 0.50605306
## Act_Fees 0.8993288 0.3809034 0.59577159 0.68482529
## Act_Licenses Act_Fees
## Population 0.7803958 0.8993288
## Act_Permit 0.5578915 0.3809034
## Act_Property_Rates 0.7856800 0.5957716
## Act_Stool_Lands 0.5060531 0.6848253
## Act_Licenses 1.0000000 0.9013082
## Act_Fees 0.9013082 1.0000000
corrplot(cor_matrix_land_Population)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, act fees, licenses) and Population overall F-statistic = 35.35 and p-value = 0.000000001547 is statistically significant with R-squared of 0.8938,,, and Adjusted R-squared of 0.8685 The individual terms that are statistically significant except stool lands
The correlation matrix shows that Population is strongly correlated with all the land-based revenues except permit and property rates
# Simple linear Regression Analysis
model_permit <- lm(Population ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
model_property <- lm(Population ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
model_stool <- lm(Population ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
model_license <- lm(Population ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
model_acts <- lm(Population ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
# Scatter plots (Population vs each land-based revenue)
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Permit, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Permit Fees", x = "Permit Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_permit)
##
## Call:
## lm(formula = Population ~ Act_Permit, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1273147 -954794 -305411 1113550 2059494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1490716.28592 256122.68831 5.820 0.00000392 ***
## Act_Permit 0.05671 0.05456 1.039 0.308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1158000 on 26 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.03989, Adjusted R-squared: 0.002968
## F-statistic: 1.08 on 1 and 26 DF, p-value: 0.3082
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Property_Rates, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Property Rates", x = "Property Rates", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1175119 -997843 -743555 873337 2244767
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1402733.772410 275921.623826 5.084 0.0000102 ***
## Act_Property_Rates -0.005057 0.039146 -0.129 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1172000 on 38 degrees of freedom
## Multiple R-squared: 0.0004391, Adjusted R-squared: -0.02587
## F-statistic: 0.01669 on 1 and 38 DF, p-value: 0.8979
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Stool_Lands, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Population ~ Act_Stool_Lands, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1285555 -947479 -443110 531319 2373153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1575521.5220 243926.0954 6.459 0.00000022 ***
## Act_Stool_Lands -0.3541 0.2149 -1.647 0.109
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1144000 on 34 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.07392, Adjusted R-squared: 0.04668
## F-statistic: 2.714 on 1 and 34 DF, p-value: 0.1087
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Licenses, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Licenses", x = "Licenses", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Population ~ Act_Licenses, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1638157 -916361 69334 779693 1753212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 364545.38622 279289.85654 1.305 0.199653
## Act_Licenses 0.20148 0.04668 4.316 0.000109 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 960100 on 38 degrees of freedom
## Multiple R-squared: 0.3289, Adjusted R-squared: 0.3113
## F-statistic: 18.63 on 1 and 38 DF, p-value: 0.0001095
ggplot(Cleaned_4_MMDAs_Data, aes(x = Act_Fees, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Act Fees", x = "Act Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Population ~ Act_Fees, data = Cleaned_4_MMDAs_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1417682 -738447 194606 518777 1251643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 246655.64196 210187.46354 1.174 0.248
## Act_Fees 0.26290 0.03924 6.699 0.0000000629 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 793600 on 38 degrees of freedom
## Multiple R-squared: 0.5415, Adjusted R-squared: 0.5294
## F-statistic: 44.88 on 1 and 38 DF, p-value: 0.00000006295
The simple linear regression analysis of the land-based revenues show all..
# no variables
# no variables